Feature domains/foodrepo.org/cron#13
Conversation
…s/barbase-tools into feature/foodrepo-parser
There was a problem hiding this comment.
Pull Request Overview
This PR adds a FoodRepo data parser and synchronization tool for extracting product information from the FoodRepo API and syncing it to a BarBase database. The implementation includes retry logic, progress tracking, and Cloudflare bypass capabilities.
- Parser scripts to fetch product data from FoodRepo API with pagination and retry mechanisms
- Async sync script to upload/update products in BarBase database
- Documentation and configuration files for setup
Reviewed Changes
Copilot reviewed 5 out of 14 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| domains/foodrepo.org/parser.py | FoodRepo API parser with hardcoded API key extracting product data |
| domains/foodrepo.org/cron/sync_barbase.py | Async sync script with hardcoded API key for uploading to BarBase |
| domains/foodrepo.org/cron/foodrepo_parser.py | Duplicate parser with placeholder API key using xlarge images |
| domains/foodrepo.org/README.md | Documentation for parser setup and usage |
| COMMANDS.md | Virtual environment creation command |
| .gitignore | Adds venv and .idea/ to ignore list |
| .idea/* | IntelliJ IDEA configuration files |
Files not reviewed (6)
- .idea/.gitignore: Language not supported
- .idea/barbase-tools.iml: Language not supported
- .idea/inspectionProfiles/profiles_settings.xml: Language not supported
- .idea/misc.xml: Language not supported
- .idea/modules.xml: Language not supported
- .idea/vcs.xml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| API_KEY = "1b3483017a437b7920add1855664d43b" |
There was a problem hiding this comment.
Hardcoded API key should not be committed to the repository. Move this to an environment variable or configuration file that is excluded from version control.
| API_KEY = "1b3483017a437b7920add1855664d43b" | |
| import os | |
| API_KEY = os.environ.get("FOODREPO_API_KEY") | |
| if not API_KEY: | |
| raise RuntimeError("FOODREPO_API_KEY environment variable not set. Please set it before running this script.") |
| API_KEY = "d9997c62-6b0c-4f61-9c7e-decae5d968a3" | ||
| STATE_FILE = "data/sync_state_v2.json" | ||
| FOODREPO_FILE = "data/foodrepo.json" | ||
|
|
There was a problem hiding this comment.
Hardcoded API key should not be committed to the repository. Move this to an environment variable or configuration file that is excluded from version control.
| API_KEY = "d9997c62-6b0c-4f61-9c7e-decae5d968a3" | |
| STATE_FILE = "data/sync_state_v2.json" | |
| FOODREPO_FILE = "data/foodrepo.json" | |
| API_KEY = os.environ.get("BARBASE_API_KEY") | |
| STATE_FILE = "data/sync_state_v2.json" | |
| FOODREPO_FILE = "data/foodrepo.json" | |
| if not API_KEY: | |
| raise RuntimeError("BARBASE_API_KEY environment variable not set") |
| log.info(f"[+] Added image {url} to product {product_id}") | ||
| else: | ||
| log.warning(f"[WARN] Could not add image ({resp.status_code})") | ||
| log.warning(f"[WARN] Could not find images for ({product_id})") |
There was a problem hiding this comment.
Line 111 executes unconditionally and always logs a warning even when images were successfully added. This warning should either be inside an 'else' block for the 'if images:' condition on line 103, or removed entirely if it's not needed.
| log.warning(f"[WARN] Could not find images for ({product_id})") | |
| else: | |
| log.warning(f"[WARN] Could not find images for ({product_id})") |
| BARBASE_API = "https://bb.solutionary.me/api/v1" | ||
| API_KEY = "d9997c62-6b0c-4f61-9c7e-decae5d968a3" | ||
| STATE_FILE = "data/sync_state_v2.json" | ||
| FOODREPO_FILE = "data/foodrepo.json" |
There was a problem hiding this comment.
The error message on line 178 references 'foodrepo_data.json' but the constant FOODREPO_FILE is set to 'data/foodrepo.json'. These filenames are inconsistent.
|
|
||
| save_json(STATE_FILE, updated_state) | ||
| log.info("[STATE] Final state saved") | ||
| save_json(STATE_FILE, updated_state) |
There was a problem hiding this comment.
The state is saved twice consecutively - once inside the async context manager (line 192) and again immediately after (line 194). Remove the duplicate save operation on line 194.
| save_json(STATE_FILE, updated_state) |
| name = product.get("display_name_translations", {}).get(lang) or \ | ||
| product.get("display_name_translations", {}).get("en", "Not found") | ||
| raw_images = product.get("images", []) | ||
| images = [img.get("large") for img in raw_images if "large" in img] |
There was a problem hiding this comment.
The parser extracts 'large' image format while the duplicate parser in foodrepo_parser.py extracts 'xlarge' format (line 54). This inconsistency could lead to different image qualities being processed. Consider standardizing to one image size across both files.
| images = [img.get("large") for img in raw_images if "large" in img] | |
| images = [img.get("xlarge") for img in raw_images if "xlarge" in img] |
| import random | ||
| import re | ||
| from datetime import datetime, timezone | ||
| from typing import Any, List, Dict |
There was a problem hiding this comment.
Import of 'Dict' is not used.
| from typing import Any, List, Dict | |
| from typing import Any, List |
No description provided.